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The application of machine learning to quantum information processing has recently attracted 
keen interest, particularly for the optimization of control parameters in quantum tasks without any 
pre-programmed knowledge. By adapting the machine learning technique, we present a novel proto¬ 
col in which an arbitrarily initialized device at a learner’s location is taught by a provider located at 
a distant place. The protocol is designed such that any external learner who attempts to participate 
in or disrupt the learning process can be prohibited or noticed. We numerically demonstrate that 
our protocol works faithfully for single-qubit operation devices. A trade-off between the inaccuracy 
and the learning time is also analyzed. 

PACS numbers: 03.67.Hk, 07.05.Mh 


I. INTRODUCTION 

Advances in quantum information science herald a new 
era of information technology. Quantum information sci¬ 
ence has recently penetrated interdisciplinary science and 
engineering fields. In particular, a current research topic 
is to adapt the basic idea of machine learning for quan¬ 
tum information processing. Although “learning” is a 
behavior of humans and other living things, a device or a 
machine can also learn a task according to the theory of 
machine learning, which was developed as a subfield of 
artificial intelligence [l|. In fact, the optimization of con¬ 
trol parameters without any pre-programmed knowledge 
can be referred to as a typical task of machine learn¬ 
ing. In this context, the techniques of machine learning 
have recently been applied to various quantum informa¬ 
tion protocols 

Following this trend, here we formulate an intriguing 
problem. Suppose that one intends to construct an oper¬ 
ation to execute a particular quantum task. For this pur¬ 
pose, a quantum machine learning technique can be used 
to train the operation devices for the desired task. How¬ 
ever, these devices are not necessarily located at the same 
place as the one who is designing the task to be taught 
(called a provider hereafter). To realize scalable quantum 
devices or networks, joint work between different parts 
of a composite architecture or between separated partic¬ 
ipants may be necessary. For the purpose, several proto¬ 
cols of distributed quantum information processing have 
been developed [3,Hi- Therefore, a quantum learning pro¬ 
tocol performed by a separated learner and provider will 
also be required in some realistic application scenarios. 

In this study, we design a protocol to prepare an ar¬ 
bitrary quantum device at a distant place by machine 
learning. We first assume an arbitrarily initialized de¬ 
vice installed at one place where the learner (say Alice) 
is located. The other, spatially separated, provider (say 
Bob) determines the target quantum task, which can¬ 
not be directly accessed by Alice. Note that the target 
information does not open to any other people. Alice 


and Bob use mainly quantum channels to communicate 
their quantum states. The output state from the device 
at Alice’s location is sent to Bob so that he can assess 
the learning progress. To obtain feedback from Bob, Al¬ 
ice also sends reference quantum states, and Bob returns 
them to Alice after performing his task. In designing 
such a protocol, we employ a specific learning algorithm 
called single measurement and feedback [9[ . When learn¬ 
ing is complete, we say that Alice’s operation device has 
learned to perform the desired quantum task. 

We also consider another issue that will be very im¬ 
portant in the related field of called “secure machine 
learning” [Ml , which significantly highlighted that the 
machine learning process itself could be a target of any 
malicious attack. The aforementioned works classified 
the possible attack scenarios and defenses against those 
providing the theoretical analyses of the lower bound on 
attacker’s work function. Here we approach to this is¬ 
sue in a quantum manner, rather focusing on the sce¬ 
nario where Alice and Bob do not want any other ex¬ 
ternal learner. Thus, we design the protocol such that 
any malicious attempts to participate in or disturb the 
learning can be prohibited or noticed, as long as Alice’s 
learning elements (i.e., controllable unitary and measure¬ 
ment devices) are not initially correlated. We will 
demonstrate by Monte Carlo simulations that our pro¬ 
tocol works well when learning tasks for qubit states. 
The learning time and inaccuracy are also analyzed in 
the demonstration. 


II. CONCEPT & METHOD 

Here we describe our scenario for developing a remote 
learning protocol. Suppose that two separated parties, 
Alice and Bob, intend to teach a device at Alice’s location 
to perform a quantum task. The target quantum task 
learned by the device can generally be identified as a 
unitary transformation from a given initial state \xa) to 
a specific final state jr^) determined by Bob, i.e., the 
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Learner (Alice) 


Provider (Bob) 



FIG. 1: (Color online) Schematic picture of our protocol. Al¬ 
ice prepares a (fiducial) state \xa) (which is also known to 
Bob) and initializes her own (unitary) device U for learning. 
Bob determines the state \tb) of the target (which is known 
only to Bob) at a distant place so that Alice’s device U learns 
a desired quantum operation (see the main text for details). 


provider. Alice and Bob communicate through quantum 
and classical channels. The process of our protocol is 
illustrated in Fig. [H The tasks performed by Alice and 
Bob and the channels are described in detail below. 

(i) Aliceas elements - Alice prepares a eontrollahle de¬ 
vice U to learn a unitary transformation task from a fidu¬ 
cial state \xa) (known to only Alice and Bob). Here U 
can be expressed as the unitary operator 

f/(a) = (1) 

where a = (ai, a 2 ,..., a^ 2 _i)^ is a {d? — l)-dimensional 
(real) vector, and G = (^i, ^ 2 , • • •, is a vector 

operator whose components are S\J{d) group generators 
|l3l . Il4| . We assume that d is the dimension of the Hilbert 
space of both \tb) and \xa)- In the process, Alice controls 
the components aj G [—7r,7r] {j = l,2,...,d^ — 1) of 
the vector a [25|. Measurement devices and a feedback 
system to update the control parameters according to a 
learning algorithm are also placed on Alice’s side. Alice 
also prepares to generate either |c) (c = 0,1) or |±), 
which will be used as a reference state in our protocol. 
Alice sends both her output state obtained by applying 
U to the state \xa) and a reference state to Bob for each 
trial. 

(ii) Quantum ehannels - Alice and Bob are connected 

by three one-way quantum channels (drawn as gray lines 
in Fig. [T]). Two of the channels are from Alice to Bob 
(C^^ and C^^), and the remaining one is from Bob to Al¬ 
ice The channel carries the reference states, 

either |c) (c = 0,1) or |±), and transmits Alice’s out¬ 
put states to Bob. The channel is used to deliver 


the reference state from Bob’s task back to Alice. 

(hi) Bob’s elements - Bob, the provider, determines 
the target state \tb) (known only to Bob) and prepares 
it for each trial. Note that Bob does not transmit any in¬ 
formation on the target state \tb) directly to Alice. After 
receiving Alice’s output state and a reference state. Bob 
operates a full-fledged quantum module, which consists 
of two Hadamard gates H = {dx d-dz) I a/ 2 and a control- 
swap (C-SWAP) gate, as illustrated in Fig. [H The C- 
SWAP gate acts as Cswap = |0) (0| G) 1^2 + |1) (1| G) S, 
where i ^2 is a -dimensional identity, and 5 is a swap 
operator, defined as S \x) \y) = \y) \x) [l5l. [iq. 

We now illustrate how our protocol runs. First, Alice 
publicly declares the commencement to Bob. Here, the 
fiducial state \xa) is one element of a predetermined set 
of initial states, which are agreed upon only by Alice 
and Bob in advance. [26|. Bob then determines the target 
state Itb) according to the input \xa) and informs Alice 
that he is also ready. When Alice and Bob identify their 
signs, [23 the process starts: 

[P.l] For every trial, Alice generates a reference state, 
either |c) (c = 0,1) or |±). For the |c) state, Alice applies 
the learning unitary operator U (a) to her input state as 

Ix^)^lr^(a)), (2) 

where a is selected on the basis of Alice’s learning algo¬ 
rithm. Note that a is initially chosen at random. For 
either |+) or |—), Alice applies a random unitary opera¬ 
tor U{Th)^ such that 

\XA)^\Mr,)), (3) 

where = {rh,i,rh,2, ■ ■ ■ ,rh,d2-iA is a randomly 

generated vector (known only to Alice). Thus, the 
states |rA(a)) and \xA{'^h)) are sequentially ehanged 
in each trial, depending on the choice of reference 
states. Alice sends both the reference state and the 
output state prepared as either |c)^ |rA(a))^ or 

\±)^\XA{rh))o^ lo Bob via and respectively. 

Here, we use the subscripts “r” and “o” to denote the 
reference and output modes, respectively. Note that Al¬ 
ice does not open the states that are being sent. 

[P.2] Then, Bob applies the delivered state I^Pa^b) 
and the target state |tb)^ to his module, where the sub¬ 
script “t” denotes the target mode. It yields the state 
l^comp) as 


llpA^s) \TB)t 


(H®i^2)(4wap)(A®id2) 


^ |ii^comp) • (4) 


Here, for \iPa^b) = |c)p |T_A(a))^, the output state 


1 comp) is given as 
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l^'comp) — 

fe=0,l 



|TA(a))„ \TB)t + (-1)'=®'= |tb)„ |rA(a)), 
^/2 


whereas for \iPa^b) = |±)r |XA(rft))o> we have 

14'comp) = 1 + )^ \XA{TCh))o \TB)t Or l^-comp) = h)^ \tb) o\XA{TCh))t • 


( 5 ) 


(6) 


Note again that only Alice knows whether the output 
l^comp) is equal to Eq. m or Eq. Bob resends the 
reference state after performing his task, written as p^ei = 
Tro,t|4'comp) {5'compl, back to Alice through 

[P.3] Then, Alice checks the returning state p^ef as fol¬ 
lows: Eirst, if the prepared reference state was |+) or 
|—), Alice performs the measurement M± with the bases 
{|+), |—)} on pref. Note that Bob’s operation does not 
alter the reference states |+) and |—) [see Eq. (|6|)]. Thus, 
if an unexpected outcome, i.e., ” (or “+”) for the ini¬ 

tially prepared reference state |+) (or |—)), appears in 
M±, Alice can immediately notice that the state trans¬ 
mitted in or has been altered by an external 
learner. Second, for the reference state |c) (c = 0,1), 
Alice applies the operation = (|1) (0| -|- |0) to the 
returned state pref and performs the measurement Mq/i 
with the bases {|0), |1)}. In this case, the measurement 
results are delivered to the feedback system for effective 
quantum learning. 

By iterating steps [P.1]-[P.3], Alice’s device U{a) is 
supposed to learn the desired task, 

(7) 

where aopt denotes the optimal vector achieved after 
learning is complete. To realize this learning process, 
we can use the following property: If |TA(aopt)) = 

Bob’s output state |Tcomp) for the reference state |c) is 
to be \0) ^ \tb) o Itb) ^ just before the measurement Mq/i 
[ see Eq. ©]> so Alice cannot obtain the outcome of |1). 
More generally, the probability Pr(/c|a) that Alice mea¬ 
sures \k) {k = 0,1) in Mq/i can be calculated as 


Pr{k\a.) 


l + (-l)V(a) 


(8) 


where /(a) = \{rB\rA{8i))f . Our learning strategy is 
thus to update U{a) until |0) is successively measured, 
without any single outcome of |1), in Mq/i. This strategy 
is conceptually equivalent to the maximization of /. 


III. LEARNING ALGORITHM 

To realize the above-mentioned strategy, we employ 
the quantum learning algorithm based on single measure¬ 
ment and feedback introduced in Ref. [9[ . This algorithm 


Oldest 

data 


• • • nF 



Newest 

data 

nF 


FIG. 2: Schematic picture of the use of FIFO memory to 
record the measurement outcome “fail” or “not-fail” (see the 
main text). 


requires a finite NL-hit classical first-in-first-out (FIFO) 
memory in which the measurement results are recorded 
as “fail” or “not-fail” data. Note that, as the memory 
size is finite, the newest data have to push the old data 
out of the memory (see Fig. H]). Thus, the memory retains 
the latest data for the learning process. 

In our case, the learning algorithm is programmed in 
Alice’s feedback system with the rule for updating the 
vector a of I/. The learning algorithm runs as follows: 
If Alice measures |0) in Mq/i (that is, “not-fail”), the 
feedback system reserves judgment regarding whether the 
current U (a) is appropriate and thus leaves the vector a 
unchanged. Otherwise, if |1) is measured (that is, “fail”), 
a is updated according to 

a(") ^ (9) 

where n denotes the number of iterations of the effective 
learning process (or the total number of measurements 
Mq/i performed), is a vector randomly generated 
at the iteration step, and N = min {Nl^Ny + A^hf)- 
Here, Ny and N^y are the number of “fail” and “not-fail” 
data recorded in the memory, respectively. Our learn¬ 
ing algorithm is intuitively understandable: The greater 
the number of “fair events is, the more changes are im¬ 
posed. Note that the random vector r^, rather than any 
pre-programmed knowledge, is used to develop a. This 
feature, i.e., using no pre-programmed knowledge, is a 
typical trait of the “learning” in a broad sense, and is of 
particular importance in our task, as it implies that any 
information about the target jr^) is not directly refer¬ 
enced to find the optimal vector aopt- 

The learning process is continued until all the “fail” 
data are eliminated in the Nl memory blocks. We call 
this the halting condition. After learning is complete, 
i.e., the halting condition is satisfied, Alice’s final output 
state I(aopt)) is supposed to be well matched to the 
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FIG. 3: (Color online) (a) Learning probability PL{n) and (b) 
survival probability Ps{n) for Nl = 100. Pl{ti) and Ps{n) 
(red solid line) are obtained by performing 1000 simulations. 
In each simulation, the target state |ts) is randomly chosen. 
The survival probability Ps{n) is well fitted to the exponen¬ 
tial decay function (green dashed line), where 

Tic is a characteristic constant that characterizes the aver¬ 
age number of effective iterations n required to complete the 
learning process; n = Uc + Nl- We obtain ric — 352 and 
thus n ^ 452. The actual average iteration number in the 
simulations is ~ 478. 


target state jr^), with / = |(rB|rA(aopt))|^ = 1 - cl 
{cl 1). Here, we can infer that the learning error cl 
becomes small for large Nl, but a large Nl requires a 
longer learning time, as explicitly shown later. 


IV. NUMERICAL ANALYSIS 

We perform numerical simulations to analyze our 
learning protocol. Here, we consider the single-qubit tar¬ 
get states (i.e., d = 2) for a numerical proof-of-principle 
demonstration. In the simulations, we investigate mainly 
the learning and survival probabilities. The learning 
probability Pl (n) is defined as the probability that learn¬ 
ing is completed before or at a certain number n of ef¬ 
fective iteration steps. The survival probability Ps{ti) is 
defined as Ps{ti) = 1 — Pl{ti); thus, it is the probability 
that learning is not completed until n [1, [5|. In Fig. 
we draw Pl{ti) and Ps{ti) for Nl = 100 by averaging 
over 1000 simulation data. In each simulation, the target 
state Itb) is randomly chosen. We find that Ps{n) is well 
fitted to the exponential decay function 

e-("+i-^^)/"% (10) 

where ric is a characteristic constant, and n > Nl be¬ 
cause of the definition of the halting condition. As Pl (n) 
is an accumulate distribution function (by definition), the 
average number n of iterations to complete the (effective) 
learning process can be estimated from the characteris¬ 
tic constant ric as n = ric + Nl- In our case, we ob¬ 
tain Tie — 352 by fitting the simulation data and thus 
n 452 with Nl = 100, whereas the actual average it¬ 
eration number counted in the simulations is 478 (see 
Tab. [Jin Appendix B). Note that ric has a finite value, 
which means that learning can be completed in a finite 
time. The identified states |rA(aopt)) after learning are 


o 

|C 


12345 12345 

Nl(xIO^) Nl(x10^) 

FIG. 4: (Golor online) (a) Graph of Nl versus n (red circles). 
We consider the fitting function n = ciN^ (green dashed 
line) and find that ci ~ 0.72 and a ~ 1.39. (b) cl (red 

circles) with respect to Nl- In this case, the data are well 
fitted to Cl = C 2 iV^^ (green dashed line) with C 2 ^ 1.12 and 
{3 ^ 0.81. Each point in (a) and (b) is obtained by averaging 
1000 simulation data. 



FIG. 5: (Golor online) versus n (red circles). Each point is 
the average value of 1000 simulation data; error bars indicate 
the standard deviation. We obtain cl — 1.10 x by data 

fitting (green dashed line). 


close to their target states, and cl is as small as 0.027 
on average. 

For further analysis, simulations are also performed 
by increasing Nl from 50 to 500 at intervals of 50. In 
Fig. m^a), we plot n with respect to Nl- Each point in 
the graph is obtained by averaging 1000 simulation data. 
The data points are very well fitted to n = ciN^ with 
Cl 0.72 and a 1.39 (for details of the fitting function, 
see Appendix C). We also plot the learning error II (av¬ 
eraged over 1000 data) in Fig. jU^b). The data points are 
also well fitted to = C 2 N^^, and we find C 2 — 1.12 and 
/3 0.81. From these results, we can see the trade-off re¬ 

lation between the inaccuracy (i.e., cl) and the learning 
time (i.e., n) depending on Nl- To see this more clearly, 
we draw the graph of versus n in Fig. [5] (see Appendix 
B). By data fitting, we obtain II — 1.10 x (green 

dashed line in Fig. |5j). 


V. DISCUSSIONS ON THE SECURITY 

We briefly discuss that our learning protocol is secure 
against any other external learner (say Eve). One may 
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explore large questions related to the security on the ma¬ 
chine learning. Here, we consider a specific question: 
‘Can Eve learn the quantum task originally designed by 
Bob without being discovered?’ To deal with this ques¬ 
tion, we consider the two scenarios. 


A. Scenario 1: intercept-and-resend attack 

We first note that the target state \tb) is neither di¬ 
rectly moved to Alice nor removed from Bob’s side. Note 
further that the optimized vector aopt cannot be viewed 
on Alice’s side after learning is complete. Thus, a strat¬ 
egy that Eve follows would be to intercept the transmit¬ 
ted particles in the channels and and 

to learn \tb) or |rA(aopt)) from the intercepted particles. 
Eve then attempts to resend the particles of the copies 
instead of the stolen ones so that Alice and Bob would 
not notice it. This, often called “intercept-and-resend 
attack,” is typical scheme for breaking a QKD system. 
However, this is quite formidable owing to the following 
complications: 

[SC.l] If the qubit states transmitted through 
or CBA are altered, Alice immediately perceives the al¬ 
terations by the measurement M±, as described above. 
This method of using a “cheat-sensitive” (sub)system is 
often used in quantum cryptographic tasks. 

[SC.2] Even though Eve can intercept the states mov¬ 
ing through and without being discov¬ 

ered, it is still impossible to learn jr^) or |TA(aopt)) 
because the intercepted particles, |rA(a)) (TA(a)| and 
|x(r/i)) (x(r/i)|, are highly mixed and indistinguishable. 
Actually, in such case, the state of N[nt intercepted parti¬ 
cles is close to the random mixture — when Aint ^ 1 
because a and are continuously changed in each trial 
of the learning process. 

[SC.3] We finally note that learning is very sensitive 
to any external alteration of Alice’s estimation states 
|rA(a)) transmitted in (see Appendix B). Thus, even 
for any super-Eve who can sort out |rA(a)) in Alice 
can be aware of any ill-intentioned attempts by moni¬ 
toring the learning time; any alteration is indicated by 
learning that is too late or cannot be completed, even 
though unexpected outcomes do not appear in M±. 


B. Scenario 2: man-in-the-middle attack 

We then consider another scenario, called “man-in-the- 
middle attack”, where Eve communicates with Alice pre¬ 
tending to be Bob, and at the same time performs the 
learning with Bob pretending to be Alice over the public 
channels. In such an attack. Eve can guide Alice’s uni¬ 
tary device(s) into an irrelevant task, e.g., \xa) \^e)^ 
and can extract Bob’s target state jr^) from the identi¬ 
fied task, e.g., \xe) in the learning with Bob.p^ 

Nevertheless, it is impossible for Eve to learn the target 
task, i.e., \xa) \^b)^ since Alice’s input state \xa) is 



FIG. 6: (Color online) The modification of the original pro¬ 
tocol to guard against a man-in-the-middle attack is done 
by placing a control-T operation, defined by IT = |0)^ (0| G 
ILo + |1)^(1|G)To (red dashed box) in Bob’s side and by small 
change of the rule [P.l] in Alice’s side (See the main text for 
details). 


not opened. We thus note that in this sense Eve’s strat¬ 
egy to learn the original task designed by Bob will end 
in failure. 

However, due to the fact that Eve can still maliciously 
interfere the learning process to separate the two legiti¬ 
mate parts, Alice and Bob, any strategy to detect a man- 
in-the-middle attack may be necessary. Eor this purpose, 
we can modify our protocol slightly further: Eirst, Bob 
mounts a safeguard, identified as a controlled operation 
IT = |0)^ (0| o io + |1)^ (1| O To, in the front of C-SWAP 
(see Eig.[6j). Here, To is an example operation of the tar¬ 
get task, i.e., To\xa) = |'rB)-[3Q| Then, Alice changes the 
rule [P.l] a bit such that, in case the reference state is |1), 
Alice sends the state \xa) to Bob without any altering so 
that the delivered state to Bob is I'iPa^b) = |l)r lx^)o* 
In this case. Bob yields the final output state l^out), by 
applying his module, as 

IV’A-s-b) \TB)t l^'out) = |l)r \tb)o , (H) 

where the reference state |1)^ goes back to Alice through 
C^^.[3l| However, Eve can never produce such an output 
I Tout) in Eq. ([n]) for the case where |c) = |1), because 
Eve cannot make a valid example of T without knowing 
\xa) [sH. Thus, if Eve intrudes into the learning, an 
unexpected outcome |0) will appear in Alice’s measure¬ 
ment Mq/i when |c) = |1). Therefore, Alice can detect 
a man-in-the-middle attack by monitoring whether the 
reference state initially prepared in |1) would come back 
without changes; a measures of |0) may indicate the pos¬ 
sible existence of a middle-man. Eve. 
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VI. SUMMARY 

In summary, we presented a protocol for a quantum 
machine learning, where a learner (Alice) could learn 
a unitary transformation corresponding to the quantum 
task determined by a provider (Bob) at a distant place. 
We clarify here that the presented method is also applica¬ 
ble in the case of non-unitary task, as a general quantum 
process can be described by an overall unitary transfor¬ 
mation in a quantum system composed of a main and 
an extra system, followed by a partial measurement. In 
such case, Alice will learn the overall unitary with arbi¬ 
trarily designed extra system and partial measurement 
in her side. What is more remarkable is that our pro¬ 
tocol was designed such that an external learner cannot 
participate in the learning process. We demonstrated 
by Monte Carlo simulations that learning can be faith¬ 
fully completed for single-qubit target states, and ana¬ 
lyzed the trade-off between the inaccuracy and the learn¬ 
ing time. We then gave brief discussions on the security 
issues under the scenarios constructed by the terms of 
intercept-and-resend and man-in-the-middle attack. We 
expect that our protocol will be developed for realistic 
applications in quantum information and cryptography 
tasks. 
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Appendix A: Construction of SU(d) group 
generators 


where fjki is the (antisymmetric) structural constant of 
SU{d) algebra. Here, if d = 2 (single qubit), we have 
Pauli spin operators as G = d;^}. 

Appendix B: Detailed data in Figs. [4| and 


Nl 

ric 

n — Nl T ric (u/sim) 

CL 

50 

- 143 

~ 193 (~ 195) 

- 0.04727 

100 

- 352 

- 452 (- 478) 

- 0.02690 

150 

^ 718 

~ 868 (~ 872) 

- 0.01964 

200 

- 996 

~ 1196 (~ 1257) 

- 0.01505 

250 

- 1365 

~ 1615 (~ 1658) 

- 0.01268 

300 

- 1711 

- 2011 2111) 

- 0.01089 

350 

-2176 

- 2526 2754) 

- 0.00981 

400 

- 2478 

- 2878 3125) 

- 0.00882 

450 

- 3207 

- 3657 3806) 

- 0.00836 

500 

- 3758 

~ 4258 (~ 4532) 

- 0.00760 


TABLE I: Values of ric, n (risim), and cl in Figs. |4] and [H 


Here we provide the detailed data in Figs. |4] and [5l 
By performing numerical simulations while increasing 
Nl from 50 to 500 at intervals of 50, we characterize 
the learning probabilities Pl(^) and survival probabili¬ 
ties Ps{n). The simulations are performed 1000 times 
for each Nl. For all the cases of Nl, the survival 
probabilities Ps{ti) are well fitted to the fitting func¬ 
tion [as in Eq. (pT|) ] with the characteris¬ 

tic constant ric. The parameters ric and the (estimated) 
average number of iterations n = Nl + ric are listed in 
Tab.H Here, ngim denotes the average number of itera¬ 
tions actually counted in the simulations. We also find 
the learning error cl (averaged over 1000 simulations) 
for each Nl- The identified values of ei, are also given in 
Tab.H We note again that the fitting parameters ric have 
finite values for all cases. We thus expect that learning 
can be completed faithfully for the given Nl. 


For any given d, we can generally define G in Eq. o, 
systematically constructing (d^ — 1) Hermitian operators 
as follows [13,13 • 

Ujk =PjkPPjk, 

< '^jk — ^ {^Pjk Pjk^ 5 

dJl = /(/+!)■ (5^2=1 ~ 5 

where 1 < / < d — 1 and 1 < j < k < d. 
Here, Pjk = \j) {k\ is a general projector. Then, 
the elements Gj of G can be given from the set 

satisfying (i) 

hermiticity Gj = Cj, (ii) traceless ti{Gj) = 0 and (hi) or¬ 
thogonality tT{&jGk) = ‘^Sjk. The elements Gj,Gk ^ G 
hold the relation. 


Gj , Gk 


I 


(Al) 


Appendix C: Approximation of ric in a random 
learning strategy 

Here we approximately estimate ric in a random learn¬ 
ing strategy. To this end, we first consider the probability 
Pr(0|a)^^ that the learning is completed for any fixed a. 
Pr(0|a) is the probability of the success event (namely, 
of measuring |0) in Mq/i) [see Eq. (|8])]. To proceed, we 
introduce a continuous function, 

- < S(a) = ^i{ai)^ 2 {ci 2 ) • • • Cd2-i(a^2_i) < 1, (Cl) 

satisfying S(a 7^ aopt) < S(aopt) = 1 - We note that this 
function S(a) is made by minimizing |S(a) — P( 0 |a)| for 
all a. Thus, we infer that P( 0 |a)^^ ^ 1 when a ^ aopt 
whereas P(0|a)^^ ^ 0 when a is far from aopt, and 
consequently, we can assume that P(0|a)^^ S(a)^^ 

(Va) when Nl is very large. 
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We then use a trick by approximating with a 

delta function as 


0 ) 


K 


exp 


{p'j ^jfjOpt) 


2A2 


(C2) 


to be sufficiently large but K Nl. Thus, we can also 
assume that S(a)^ P(0|a)^. In the circumstance, we 

estimate the average probability P(0|a)^^, such that (for 

A<ci lai) 


where %>pt is a component of aopt, and K is assumed 


J 


-P(0|a)^| ~ j dai^iiai)^^ J da2Ci{a2)^^ ■■■ j dad2_iCd2_i{aa2 

/ii:27rA2\ 


d —1 1^00 

/ daj exp 

^• — 1 ^—oo 


i=i 


{Ojj Ujf^opt) 


2A2 


K 


d2_i 


\Nl 


Then, let ns consider a probability that the learning is terminated at n iteration step: 

(l - (l-P(0|a(2))^^^ ... (^1 - P(0|a(”“i))^^) P(0|a(”))^^, 


(C3) 


(C4) 


for any sequence a^^^ ^ a^^^ ^ a^^^ ^ ^ a^’^^ of updating the parameter vector in the learning. Thus, in a 

random learning strategy, we can approximate the learning probability Pl(^), introduced in Sec. [TVl such that 

Pi(n) « P(0|a(i))f4 

+ (l-P(0|aW)f4) P(0|a(2))f4 

+ (l - P(0|a(i))^4) (l - P(0|a(2))f4) P(0|a(3))f4 

+ (l - P(0|a(i))^4) (l - P(0|a(2))f4) ... (l - P(0|ai”-i))^-) P(0|ai"))^- 

n—1 

« ^ (1 - P(0|a)f4)T(0|a)f4 = 1 - (1 - P(0|a)f4)” , (C5) 

i=0 

where P(0|a^^^)^^ = P(0|a^^^)^^ = ... = P(0|a^^^)^^ = P(0|a)^^. Here, using Eq. (jCSp . we finally arrive at (for 
Nl 1 and A <C 1) 

Pl{p) ~ 1 — e~^, or equivalently, Ps{n) ^ e~^, (C6) 



Appendix D: Effect on learning of any alterations in 

nAB 

Cq 

Here we consider a situation in which particles in the 
state |rA(a)) moving through are altered with a cer¬ 
tain probability pint by some malicious Eve. Here, we 
assume a super-Eve who can sort out Alice’s estimation 
state |rA(a)), discarding the blinded state |x^(r/j,)), in 
for his/her own effective learning. Eve’s aim is to 
learn Alice’s vector a and thus to obtain the optimal vec¬ 
tor as close to aopt as possible when Alice’s learning is 
complete. Eve can thus adopt the strategy of learning 
Alice’s vector a using a stolen particle for each trial and 


resend the newly generated particle of his/her estimated 
state |r£;(e)) to Bob, where e is a vector of Eve’s own 
device. 

However, in this case, it takes much longer to complete 
the learning process because some particles of |TA(a)) 
are altered as |TA(a)) ^ \te{^))- To corroborate this, 
we perform numerical simulations of single-qubit target 
states (d = 2). Here, we set Nl = 100 and consider three 
cases: pint = 0.1, 0.2, and 0.3. We assume further that 
Eve can use the best strategy for each stolen particle, 
i.e., |(rE(a')|rA(a))| = | [I^. In Eig. [71 we present the 
learning and survival probabilities for pint = 0.1 (red), 
0.2 (green), and 0.3 (blue) on a log scale. The survival 
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FIG. 7: (Color online) (Color online) (a) Learning probability 
Pl{ti) and (b) survival probability Ps{ti) on a log scale, as¬ 
suming some Eve who can steal particles moving in with 
a certain probability pint- We assume that Eve can adopt the 
best learning strategy for her learning (see the main text). 
Here, we set Nl = 100 and consider the qubit target states, 
i.e., d — 2. We consider three cases: pint = 0.1 (red), 0.2 
(green), and 0.3 (blue). We perform 1000 simulations to draw 
the graphs. In each simulation, the target state |t) is ran¬ 
domly chosen. The survival probabilities Psiji) are also well 
fitted to Eq. (ITOl) (black solid lines). 


probabilities are also well matched to Eq. m- The data 
are listed in Tab. mi Note here that n increases exponen¬ 


tially with increasing alteration probability pint- In this 
sense, the learning efficiency is very sensitive to the al¬ 
terations. Thus, by monitoring the learning time, Alice 
can sense even any super-Eve; if learning is too late or 
cannot be completed, Alice stops the learning so that Eve 
cannot complete the process e ^ aopt- 


Pint 

n — Nl -\- nc (n-sim) 

Cl 

0.1 

~ 1.736 X 10"^ (~ 1.747 x 10^) 

- 0.019 

0.2 

- 1.808 X 10^ L 1-956 X 10^) 

- 0.021 

0.3 

~ 2.473 X 10® L 2.767 x 10®) 

- 0.022 


TABLE IL Values of ric, n (risim), and cl in Eig. [71 


Here we briefly note that, in a realistic application, 
Alice should evaluate and analyze the learning time, i.e., 
ric, by performing the learning with her own devices, be¬ 
fore starting the protocol with Bob. Such task is carried 
out taking into account the errors due to the imprecise 
control or contaminated devices. The maximum tolera¬ 
ble noise in the channels should also be estimated in this 
stage. 
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